Abstract: Digital music is widely used now-a-days. The systems are required for user to find the music the need. The attention has been an increasing on learning feature representations from audio data used in various MIR problems. The key element for relevant retrieval is the audio content representation. Good representation should be short, terse, efficient, and easy and fast to compute. The Bag-of-Frames (BoF) approach is evaluated. In this approach, low-level MFCC and PLP features are explored from the audio signal of songs. The encoding stage is added with pre-computed codebook and pooling stage gives compact representation for the feature vector. A Vector Quantization (VQ) encoding method using Online Dictionary Learning (ODL) algorithm performs well in query-by-example task of MIR to decrease the runtime of relevant retrieval. Experimental result shows that PLP performs better than MFCC.
Keywords: Audio content representation; music information retrieval; MFCC; PLP; sparse coding; bag-of-frames; vector quantization.